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1.  THE  METHOD  OF  CONJUGATE  GRADIENTS 


The  conjugate  igradient  method  [  1  ]  li  ta  efficient  procedure  for  unconstrained 
optimization  prob'ems  of  th«  type 
minimize  f(Xj,  x,,  . . . ,  x^) 

where  f{x)  ie  a  suitable  (differentiable,  and  preferably  convex)  function  on 

f*  T  X  tVi 

E  \  In  particular,  if  f(x)  -  c+p  x  +  jx  Qx  and  Q  ia  an  n  order  symmetric 

and  positive  semidefinitc  matrix  (thus  i(x)  is  convex),  then  the  conjugate 

gradient  method  will  terminate  at  the  solution  in  at  most  n  steps,  provided 

the  standard  starting  procedure  is  used.  A  statement  of  the  conjugate  gradient 

method  for  this  function  is: 


Given  xfl3  let  dQ  =  -vf(x  ). 


For  k>0,  given  xk  and  dk,  let  \ 

where  t^  is  the  value  of  t  minimizing  +  td^). 

# 

If  vf^  )  =  0,  stop.  Otherwise,  let 

“W  w(lW + ‘A  ’ 


where  s,  is  chosen  so  that  d.  ,Qd,  =  0. 
k  k+I  k 


Let  g  =  Vf(x,  )  for  all  k.  Noting  that 

iC  iC 


sk+i  =,7i(>ik+i) 

=  pT  +  +  t^) 


=  p  +  Qxk  +  tkQdk 
=  gk  +  tlcQdk' 


i. 


wo  can  write  the  recursion  &«: 

dO='gOJ  U) 

For  k  >  0, 


gk  +  l  "  gk  +  tkQdk 

dk+l  ~gk+l  +  Skdk 

(2) 

where 

(3) 

and 

\  *  *LiQdk/dkQdk 

(4) 

or 

(5) 

The  formula  (5)  for  is  essentially  formula  (3:2b)  of  Hestenes  and 

Stiefel,  rather  than  the  more  commonly  used  formula  (3:le),  which  in  our 

2  2 

notation  is  +  ^  /g^  .  As  they  subsequently  show,  the  former  gives 

better  protection  against  the  accumulation  of  roundoff  error.  More  im- 

T 

portantly,  it  ensures  that  dk+1Qdk  =  0  for  each  k,  independently  of  whether 

the  other  steps  have  been  carried  out  accurately,  which  the  latter  formula 

does  not.  If  all  the  needed  relatione  do  hold  accurately,  it  can  be  shown  that 

the  successive  directions  dQ,  d  ,  .  . .  are  all  linearly  independent  and  con- 
T 

jugate  (that  is,  d^Qd^  =  0  for  j  ^  k),  and  that  x^  minimizes  the  function  f 
on  the  affine  set  passing  through  xQ  and  spanned  by  dQ  d^,  ...,  d^  .  Con¬ 
sequently  the  procedure  must  terminate  with  g  a  0  for  some  k  <  n. 

K 


2.  THE  PROBLEM 


It  is  important  to  note  that  both  the 
(3,  4)  of  the  coefficients  t^  and  s^ 


starting  condition  (l)  and  the  determinations 
must  be  observed  precisely  in  order  that 


\ 


3. 


the  above  termination  ensue* ;  it  cannot  be  shown  otherwise.  Indeed,  failure 
to  choose  a  "standard  start"-~one  in  which  d^  is  parallel  to  gg--makes 

X 

it  impossible  to  retain  the  conjugancy  relation  d  Qd  =  0  for  |j-k|/l 

J  * 

using  formulas  of  the  type  of  (2).  (We  have  seen  this  fact,  overlooked  in 
some  reports  in  the  literature,  leading  to  an  overestimate  of  the  conver¬ 
gence  rate  of  the  method.  )  Since,  however,  the  procedure  is  almost  in¬ 
variably  used  under  circumstances  in  which  the  condition  g^=  0  cannot  be 
precisely  met —  with  the  quadratic  problem  in  an  environment  of  roundoff 
error,  and,  more  significantly,  in  extensions  of  the  method  to  nonquadratic 

problems,  such  as  that  due  to  Fletcher  and  Reeves  [  2]  --  provision  for  con- 
th 

tinuing  after  the  n  step  must  be  made.  It  has  generally  been  recognized 
as  good  practice  to  restart  the  procedure  after  n  (or  possibly  n+ 1  or  n+2) 
iterations;  that  is,  to  begin  all  over  again,  using  the  latest  point  found 
as  the  new  x^,  and  thus  rebuild  a  new  set  of  conjugate  directions. 

The  purpose  of  the  study  described  here  was  Xo  determine  whether 
restarting  was,  in  fact,  necessary,  or  whether  the  procedure  could  be 
continued  Indefinitel  without  restarting  and  not  suffer.  We  have  concluded 

I 

that  restarting  is  necessary  for  quick  convergence.  Indeed,  we  have  an 
example  (for  n=3)  of  a  quadratic  problem  which  shows  that  convergence  can 
be  no  better  than  linear  when  a  nonstandard  start  is  used  (while,  of  course, 
a  standard  start  or  restart  would  cause  termination  in  at  most  three  itera¬ 


tions). 


4. 


3.  THE  EXAMPLE:  CONVERGENCE  IS  AT  BEST  LINEAR 

We  have  run  kbout  fifty  steps  of  the  continued  conjugate  gradient 

method  as  defined  by  equations  (2-4)  above  on  each  of  some  one  hundred 

quadratic,  three-variable  problems,  examining  graphically  the  ratios 

T 

f(x^+  1^  successive  values  of  the  function  f(x)  =  ix  Qx.  In  about 

half  of  the  trials,  Q  was  the  diagonal  matrix  whose  eigenvalues  are 
(0.1,  1,  1);  the  starting  vectors  g^  and  d^  were  chosen  randomly.  In 
every  case  the  ratios^  while  first  seemingly  randomly  scattered  between  0 
and  1,  were  found  to  lie  in  a  rather  definitely  marked  interval  [a,b]  with 
0<a<b<l.  In  many  cases  it  appeared  that  something  very  much  like  a 
sine  cui've  having  a  period  between  three  and  fivo  steps  could  be  fitted  to 
the  set  of  successive  ratios.  After  considerable  experimentation  with  the 
starting  data,  we  found  an  example  in  which  the  ratios  were  constant.  The 
other  data  of  the  procedure  then  exhibited  a  remarkable  periodicity,  and 
the  discovery  of  simple  relationships  among  these  led  to  the  following  ex¬ 
ample: 


let 


0  0 

1  0 

0  1 

T 

-  n/6,’  0)  /  n/6 


dQ  ■  (-10  <s/b,  14,  -3»/6)T/W30 


One  step  of  the  method  given  by  equations  (2-4)  is 

**♦  r«k*  tkQdk 

“S*.  i  ■  'eic*  i +  ‘iA  ■ 


l 


5. 


In  our  case  *  -8/5,  and  =  9/2  6  for  all  k. 
Furthermore,  the  relatione 


8,  ,  *  rRg,  and 

bk+  1  *k 


<WrRdic 


hold  for  all  k  where  r  =  3/6  and  R  is  the  orthogonal  matrix 


1  0  0 
0  -1/6  -(2n/S)/6 

_0  ( 2*JZ)  /  6  -1/5 

k  k 

Thus  *(rR)  gp  and  d^  =  (rR)  d^  for  all  k.  Each  successive  applica¬ 
tion  of  the  matrix  rR  rotates  the  gradient  and  the  direction  through  an 

angle  arccoa  (-1/6)  around  the  long  axis  of  the  three -dimensional  ellipsoid 
T 

x  Qx  *  1,  and  diminishes  both  of  these  vectors  in  magnitude  by  the  factor 
r  =  3/6.  Thus  the  ratio  f(x^+  ^ ) / f( x^)  is  9/25  for  all  k. 


4.  THEOREM:  CONVERGENCE  IS  AT  WORST  LINEAR 
To  bound  the  rate  of  convergence  of  the  nonrestarted  conjugate  gradient 
method  from  both  sides,  we  will  show  that  its  convergence  is  at  worst 
linear. 

T 

Let  i. (x)  =  ix  Qx  (we  can  always  transform  the  original  problem 

T  -1 

so  that  it  has  this  form).  Since  g  -  Vf(x)  =  Qx,  f(x)  "  ig  Q  g.  The 

a  T  T 

minimum  of  f  along  any  line  x  +  td  is  given  byt*  t  =■  -g  d/d  Qd  (compare 
with  formula  (3),  suppressing  "k").  Setting  x+  a  x  +  td  and  g+  ■  Vf(x+), 
we  have 

2f(x+)  ■  gjQg+  -  (g  +  tQd)TQ"1(g  ♦  t  Qd)  ■  gQ*^  -  (gTd)2/dTQd.  ' 


We  consider  two  cases: 


(i)  di  a  -g|  that  is,  the  step  Is  an  ordinary  steepest  descent  step. 
Then 

,,,  ^  T-l  .  T  .2  .  T_ 

2f(x+)  “  g  Q  g  -  (g  g)  /g  Qg- 

(ii)  The  point  x  was  obtained  by  minimising  £  along  bom-i 

T 

line  having  the  direction  c,  whence  g  c  *  0,  and  thee 
the  direction  d  was  obtained  as  in  formulas  (2,4),  so 
that 

T 

d  ■  -g  +  sc  and  d  Qc  =  0, 

T  T 

Then  g  d  =  0  -  g  g  and 

T  T  T  T  T 

d  Qd  *=  -g  Qd  =  -g  (-Qg  +  sQc)  =  g  Qg  -  sg  Qc 

T  .  T  _  .2  .  T  _ 

=  g  Qg  -  (g  Qc)  /c  Qc. 

T  T 

We  see  that  d  Qd  g  Qg. 

T-l  T  2  T 

Since  in  case  (ii)  2f(x+)  =  g  Q  g  -(g  g)  /d  Qd,  the  resulting 
value  of  f  is  no  h'gher  in  case  (ii)  than  in  case  (i  );  the  fact  that  the  di¬ 
rection  d  was  obtained  by  conjugating  -g  with  respect  to  the  previous  direc¬ 
tion,  rather  than  taking  it  to  be  -g  itself,  has  not  hurt.  Thus  each  step 

i 

of  the  continued  conjugate  gradient  method  decreases  the  function  at  least 
as  much  as  would  one  step  of  steepest  descent  taken  at  the  same  point. 

The  inequality 

fUk+l)/f(xk}  *  ^A-1/A+1)2 

is  known  to  hold  for  steepest  descent,  where  A  is  the  condition  number 
of  the  matrix  Q  (namely,  the  ratio  of  the  largest  to  smallest  eigenvalue). 


It  follows  that  the  inequality  also  holds  for  the  conjugate  gradient  method. 


so  that  its  convergence  is  at  worst  linear. 
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